2 research outputs found

    Machine Learning based on Probabilistic Models Applied to Medical Data: The Case of Prostate Cancer

    Get PDF
    The growth in the amount of data in companies puts analysts in difficulties when extracting hidden knowledge from data. Several models have emerged that focus on the notion of distances while ignoring the notion of conditional probability density. This research study focuses on segmentation using mixture models and Bayesian networks for medical data mining. As enterprise data becomes large, there is a way to apply data mining methods to make sense of it using classification methods. We designed different models with different architectures and then applied these models to the medical database. The algorithms were implemented for the real data. The objective is to classify individuals according to the conditional probability density of random variables, in addition to identifying causalities between traits from tests of conditional independence and a correlation measure, both based on χ2. After a quick illustration of several models (decision tree, SVM, K-means, Bayes), we applied our method to data from an epidemiological study (done at the University of Kinshasa University clinics) of case-control of prostate cancer. Thus, we found after interpretation of the results followed by discussion that our model allows us to classify a new individual with an accuracy of 96%

    Cost-effective and Low-complexity Non-constrained Workflow Scheduling for Cloud Computing Environment

    No full text
    Cloud computing possesses the merit of being a faster and cost-effective platform in terms of executing scientific workflow applications. Scientific workflow applications are found in different domains, such as security, astronomy, science, etc. They are represented by complex sizes, which makes them computationally intensive. The main key to the successful execution of scientific workflow applications lies in task resource mapping. However, task-resource mapping in a cloud environment is classified as NP-complete. Finding a good schedule that satisfies users' quality of service requirements is still complicated. Even if different studies have been carried out to propose different algorithms that address this issue, there is still a big room for improvement. Some proposed algorithms focused on optimizing different objectives such as makespan, cost, and energy. Some of those studies fail to produce low-time complexity and low-runtime scientific workflow scheduling algorithms. In this paper, we proposed a non-constrained, low-runtime, and low-time-complexity scientific workflow scheduling algorithm for cost minimization. Since the proposed algorithm is a list scheduling algorithm, its key success is properly selecting computing resources and its operating CPU frequency for each task using the maximum cost difference and minimum cost-execution difference from the mean. Our algorithm achieves almost the same cost reduction results as some of the current states of the arts while it is still low complex and uses less run-time
    corecore